Handle sequence_lens for GRU on CPU #2479

chentong319 · 2023-09-05T18:26:51Z

This PR is a quick fix for sequence_lens. According to the definition from PyTorch, padding value is added after a sequence reaches its sequence lens. This PR does not try to save the computation. I will try another PR to use scf.if so that all the RNN op can be handled and computation will be saved.
The output of my test case of GRU seems to conform with the PyTorch example.

module{
func.func @main_graph(%arg0: tensor<2x2x1xf32>, %arg1: tensor<1x3x1xf32>, %arg2 : tensor<1x3x1xf32>) -> (tensor<*xf32>, tensor<*xf32>) {
  %lens = onnx.Constant dense<[2, 1]> : tensor<2xi32>
  %cst = "onnx.NoValue"() {value} : () -> none
  %Y, %Y_h = "onnx.GRU"(%arg0, %arg1, %arg2, %cst, %lens, %cst) : ( tensor<2x2x1xf32>, tensor<1x3x1xf32>, tensor<1x3x1xf32>, none, tensor<2xi32>, none) -> (tensor<*xf32>, tensor<*xf32>)
 onnx.Return %Y, %Y_h : tensor<*xf32>, tensor<*xf32>
}
"onnx.EntryPoint"() {func = @main_graph} : () -> ()
}

The output is

The 1st output output_1:[2x1x2x1xfloat32] is: 
 [[[[ 0.0011079 ]
   [-0.00399583]]]


 [[[-0.001489  ]
   [ 0.        ]]]] 

The 2nd output output_1:[1x2x1xfloat32] is: 
 [[[-0.001489]
  [ 0.      ]]]

Limitations: This PR does not save computation with the sequence_lens info. To do that, I can add a scf.if within the loop for sequence and batch. However, the existing implementation defines the loop nest for batch and hidden state together. Need some efforts to break the loop nest. It is doable. But priority?

Question: should the final result be modified according to the sequence_lens? For example, should the 2nd output be [[[-0.001489] [-0.00399583]]]? I did not find any specification for that.

Signed-off-by: chentong319 <[email protected]>

chentong319 · 2023-09-06T00:18:38Z

Another test case for the initialH.

module{
func.func @main_graph(%arg0: tensor<4x3x1xf32>, %arg1: tensor<1x6x1xf32>, %arg2 : tensor<1x6x2xf32>) -> (tensor<*xf32>, tensor<*xf32>) {
  %lens = onnx.Constant dense<[2,3,1]> : tensor<3xi32>
  %initial = onnx.Constant dense<[[[0., 1.],[2.0, 3.0],[4.0, 5.0]]]> : tensor<1x3x2xf32>
  %cst = "onnx.NoValue"() {value} : () -> none
  %Y, %Y_h = "onnx.GRU"(%arg0, %arg1, %arg2, %cst, %lens, %initial) : 
    ( tensor<4x3x1xf32>, tensor<1x6x1xf32>, tensor<1x6x2xf32>, none, tensor<3xi32>, tensor<1x3x2xf32>) 
    -> (tensor<*xf32>, tensor<*xf32>)
 onnx.Return %Y, %Y_h : tensor<*xf32>, tensor<*xf32>
}
"onnx.EntryPoint"() {func = @main_graph} : () -> ()
}

The result:

The 1st output output_1:[4x1x3x2xfloat32] is: 
 [[[[1.2503355e-03 4.5813003e-01]
   [8.9796937e-01 1.2981267e+00]
   [1.6873405e+00 2.0353923e+00]]]


 [[[5.4413028e-04 2.1451449e-01]
   [4.2194879e-01 5.9896427e-01]
   [4.0000000e+00 5.0000000e+00]]]


 [[[0.0000000e+00 1.0000000e+00]
   [2.0093442e-01 2.8191486e-01]
   [4.0000000e+00 5.0000000e+00]]]


 [[[0.0000000e+00 1.0000000e+00]
   [2.0000000e+00 3.0000000e+00]
   [4.0000000e+00 5.0000000e+00]]]] 

The 2nd output output_1:[1x3x2xfloat32] is: 
 [[[0. 1.]
  [2. 3.]
  [4. 5.]]]

tungld · 2023-09-06T01:16:27Z

src/Conversion/ONNXToKrnl/RNN/GRU.cpp

+            Value cond = createMath.sge(
+                createMath.cast(sequenceUB.getType(), sequenceIV), sequenceUB);
+            nextHt = createMath.select(cond, /*padding*/ initial, nextHt);
+          }


Could we create a common function for this to avoid boilerplate? and we can call it in other ops like LSTM and RNN.

This reverts commit bcc617c.

Signed-off-by: chentong319 <[email protected]>

chentong319 · 2023-09-07T21:42:21Z

Now both the first and second output of GRU are the same as the torch GRU example.

Signed-off-by: chentong319 <[email protected]>

tungld

LGTM!

jenkins-droid · 2023-09-08T00:04:19Z

Jenkins Linux s390x Build #12569 [push] Handle sequence_lens for... started at 20:04

jenkins-droid · 2023-09-08T00:04:20Z

Jenkins Linux ppc64le Build #11562 [push] Handle sequence_lens for... started at 20:13

jenkins-droid · 2023-09-08T00:04:23Z

Jenkins Linux amd64 Build #12557 [push] Handle sequence_lens for... started at 19:04

jenkins-droid · 2023-09-08T01:10:06Z

Jenkins Linux amd64 Build #12557 [push] Handle sequence_lens for... passed after 1 hr 5 min

jenkins-droid · 2023-09-08T01:28:34Z

Jenkins Linux s390x Build #12569 [push] Handle sequence_lens for... passed after 1 hr 24 min

jenkins-droid · 2023-09-08T01:48:31Z

Jenkins Linux ppc64le Build #11562 [push] Handle sequence_lens for... passed after 1 hr 44 min

change Ht

27afb6c

Signed-off-by: chentong319 <[email protected]>

chentong319 marked this pull request as draft September 5, 2023 18:27

chentong319 added 6 commits September 5, 2023 14:33

format

994e649

Signed-off-by: chentong319 <[email protected]>

format

a523dbb

Signed-off-by: chentong319 <[email protected]>

Merge remote-tracking branch 'upstream/main' into gru-seq-cpu

82c8cf2

use initial

9bb9b0a

Signed-off-by: chentong319 <[email protected]>

format

e52ad92

Signed-off-by: chentong319 <[email protected]>

for test

2affba2

Signed-off-by: chentong319 <[email protected]>

chentong319 changed the title ~~Handel sequence_lens for GRU on CPU~~ Handle sequence_lens for GRU on CPU Sep 5, 2023

chentong319 added 2 commits September 5, 2023 19:52

comment

9c94df1

Signed-off-by: chentong319 <[email protected]>

format

0fa9322

Signed-off-by: chentong319 <[email protected]>

chentong319 marked this pull request as ready for review September 6, 2023 00:10

chentong319 requested a review from tungld September 6, 2023 00:16

chentong319 added the Ready for Review label Sep 6, 2023

tungld reviewed Sep 6, 2023

View reviewed changes

chentong319 added 8 commits September 6, 2023 12:21

Merge remote-tracking branch 'upstream/main' into gru-seq-cpu

f1b1723

fix output

bcc617c

Merge remote-tracking branch 'upstream/main' into gru-seq-cpu

a268518

Revert "fix output"

f4813fd

This reverts commit bcc617c.

new implementation of output

9c91b4b

Signed-off-by: chentong319 <[email protected]>

Merge remote-tracking branch 'upstream/main' into gru-seq-cpu

2bb9b0d

format

31a0885

Signed-off-by: chentong319 <[email protected]>

function

5565ec7

Signed-off-by: chentong319 <[email protected]>

clean

14a46a6

Signed-off-by: chentong319 <[email protected]>

tungld approved these changes Sep 7, 2023

View reviewed changes

chentong319 merged commit e3a8a67 into onnx:main Sep 8, 2023
4 checks passed

chentong319 deleted the gru-seq-cpu branch September 8, 2023 00:03

tungld mentioned this pull request Sep 22, 2023

[Do-not-merge][NNPA] Donot check sequence_lens in lowering ONNX to ZHigh #2442

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle sequence_lens for GRU on CPU #2479

Handle sequence_lens for GRU on CPU #2479

chentong319 commented Sep 5, 2023 •

edited

Loading

chentong319 commented Sep 6, 2023

tungld Sep 6, 2023

chentong319 Sep 7, 2023

chentong319 commented Sep 7, 2023

tungld left a comment

jenkins-droid commented Sep 8, 2023

jenkins-droid commented Sep 8, 2023

jenkins-droid commented Sep 8, 2023

jenkins-droid commented Sep 8, 2023

jenkins-droid commented Sep 8, 2023

jenkins-droid commented Sep 8, 2023

Handle sequence_lens for GRU on CPU #2479

Handle sequence_lens for GRU on CPU #2479

Conversation

chentong319 commented Sep 5, 2023 • edited Loading

chentong319 commented Sep 6, 2023

tungld Sep 6, 2023

Choose a reason for hiding this comment

chentong319 Sep 7, 2023

Choose a reason for hiding this comment

chentong319 commented Sep 7, 2023

tungld left a comment

Choose a reason for hiding this comment

jenkins-droid commented Sep 8, 2023

jenkins-droid commented Sep 8, 2023

jenkins-droid commented Sep 8, 2023

jenkins-droid commented Sep 8, 2023

jenkins-droid commented Sep 8, 2023

jenkins-droid commented Sep 8, 2023

chentong319 commented Sep 5, 2023 •

edited

Loading